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ABSTRACT 

Homogeneity analysis, or multiple correspondence analysis, is 
usually applied to k separate variables. In this paper we apply 1t 
to sets of variables by using sums within sets. The resulting 
technique 1s called OVERALS. It uses the notion of optical scallnq, 
with transformations that can be multiple or single. The single 
transformations consist of three types: nominal, ordinal, and 
numerical. The corresponding OVERALS computer program minimizes a 
least squares loss function by using an alternating lea^t squares 
algorithm. Many existing linear and nonlinear multivariate analysis 
techniques are shown to be special cases of OVERALS. An application 
to data from an epidemiological survey is presented. 

Keywords: homogeneity analysis, correspondence analysis, optimal 
scaling, transformation, alternating least squares, canonical 
correlation analysis, principal component analysis 
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INTRODUCTION 



Approximately ten years ago Young, Oe Leeuw, and Takane started 
to apply the optimal scaling Ideas, that had originated in 
multidimensional scaling, to multivariate analyses. This made It 
possible to link the developments In mil tl dimensional scaling with 
older but related developments in mjlti varlate analysis centering 
around the notion of coding categorical variables by using matrices 
with zeroes and ones. The resulting ALSOS (Alternating Least 
Squares with Optimal Scalinq) approach to nultl variate data 
analysis was based on the Idea of alternating the transformation or 
quantification of variables witt the fitting of model parameters in 
an Iterative way, using least squares loss functions. This resulted 
In a series of programs for nonlinear univariate analysis, with 
special programs for aMitlvity analysis, multiple regression, 
canonical correlation analysis, principal component analysis, and 
factor analysis. I review of the general ALSOS approach and of the 
results that have !>een obtained, Is given by Young (1981). 

The ALSOS approach to algorithm construction 1s quite general, 
but the framework 1s a bit too narrow for some applications In 
multivariate analysis, e.g. correspondence analysis (Benzecri et 
al. t 1973; Benzecri et al., 1980; Nishlsato, 1980; Lebart, 
Horinaux, and Warwick, 1984; Greenacre, 1984). Although 
correspondence analysis does not fit directly Into the ALSOS 
approach, It is still possible to relate it to the computational 
developments in ALSOS. This has been done In considerable detail by 
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Glfl (1981), which Is summarized briefly In Oe Leeuw (1984). In 
this paper we discuss some of the more specific principles of 
algorithm construction used by Glfl, and we aoply them to OVERALS , 
a very general nonlinear multivariate analysis technique, covering 
both ALSOS and correspondence analysis. 

The major feature of the Glfl-system for nonlinear multivariate 
analysis Is that it takes homogeneity analysis as Its starting 
point. Homogeneity analysis, also known as multiple correspondence 
analysis, 1s discussed 1n great detail in the references on 
correspondence analysis mentioned above, and by Tenenhaus and Young 
(1984). Glfl Introduces homogeneity analysis as the cornerstone of 
multivariate data analysis, and then specializes to other 
multivariate techniques by Imposing various forms of restrictions 
on the parameters. Imposing restrictions is one way of dealing with 
prior Information. As a consequence the number of parameters 1s 
reduced, which generally Improves both the stability and the 
Interpretabllity of the solution. The most Inportant restrictions 
are the addltlvlty restrictions. These are discussed In detail 1n 
this paper in the section on sets of variables. In order to fit the 
classical linear techniques smoothly Into the system we aiso need 
the rank-one restrictions, which can be combined with addltlvlty 
restrictions to produce a very general class of techniques. Finally 
measurement restrictions are build Into the system, In much the 
same way as In ALSOS. We shall treat these notions in more detail 
In the section on rank -one restrictions and optimal scaling. 

The technique that results if we minimize the general least 
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squares loss function of homogeneity analysis under the types of 
restrictions mentioned above 1s called OVERALS. We have to be 
careful here, because terminological confusion Is possible at this 
point. In the first place we discuss a restricted minimization 
problem, which we call the OVERALS problem. In the second place wc 
propose an alternating least squares algorithm to solve this 
minimization problem. This 1s called the OVERALS algorithm. And 
thirdly we have written a FORTRAN computer program implementing 
this algorithm. This 's the OVERALS program. It Is quite 1m>ortant 
to keep these Jiree meanings of the word OVERALS apart, although In 
this paper the context will always Indicate which one of the three 
meanings we are using at any given moment. 



HOMOGENEITY ANALYSIS 



Homogeneity analysis or oiltlple correspondence analysis Is a 

method to maximize the homogeneity of a number of variables 
(Guttman 9 1941; De Leeuw, 1973, chapter 3; Nlshlsato, 1980, chapter 
5; Meulman, 1982; Lebart et a1. f 1984, chapter 6; Greenacre, 1984, 
chapter 5). To define homogeneity analysis we need some notation. 
Suppose we have an n x m multivariate data matrix, with rows 
corresponding to objects and columns to variables. Assume that 
variable j takes kj different values (has kj categories) and define 
the matrix Gj as the n x kj Indicator matrix corresponding to this 
variable. An Indicator matrix Indicates which categories are scored 
by which objects. Rows correspond to objects, columns to 
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categories. Its elements consist of zeroes (not scored) and ones 
(scored). 

Homogeneity analysis determines Quantifications or 
transformations of the categories of each of the variables such 
that homogeneity Is maximized. A definition of homogeneity follows. 
Let us use the vector yj , with kj elements, for the 
quantifications of the categories of variable j. Expression Gjyj 
represents a single quantification or transformation of the n 
objects, Induced by variable j. Without further conditions on the 
yj the quantification Is restricted only by the ties In the data. 
I.e. objects In the same category get the same quantification. In 
homogeneity analysis we work with p simultaneous quantifications 
for each variable (or, to put It differently, with p~d1*en$1ona1 
Quantifications). Let us collect them In kj x p matrices Yj, and 
let us call these the wltlple nominal quantifications of variable 
j. Then the matrices GjYj Induce m multiple quantifications of the 
objects. Perfect homogeneity Is defined if all multiple 
Quantifications of the objects are the same, say X (n x p), thus If 
X = GjYj » ... » G m Y m , (cf. De Leeuw, 1973, chap. 2). Homogeneity 
analysis minimizes the loss of homogeneity, with loss defined in 
terms of squared deviations, over normalized object 
Quantifications: 

M 

(1) min 0 (X,Y) » I SSQ(X-G,Y.), 

j=l J J 

subject to the condition that X*X ■ nl and u'X - 0, 

where u Is a column with n elements equal to one. Symbol SSQL) Is 



Overals 
6 

used for the sum of squares of the elements of a vector or matrix. 
The condition u'X s 0 guarantees tnat X Is in deviations from the 
column means, while X'X « nl makes the columns of X uncorrected, 
with variances equal to one. Elements of X are called object 
scores. At this point we do not go further Into the formal 
develooment of homogeneity analysis, or Into computational 
Implementations. We come back to this at a later stage of the 
paper, 

RANK-ONE RESTRICTIONS AND OPTIMAL SCALING 

In homogeneity analysis with the dimensionality p > 2 we work 
with nultlple quantifications. Each dimension adds another 
Quantification of the categories of each variable, and the 
different quantifications of the same variable have no simple 
relation to each other. This makes Interpretation sometimes 
complicated, especially In the case of variables whose categories 
have a clear ordinal or even numerical Interpretation. For this 
reason we Introduce rank -one restrictions Into homogeneity 
analysis, which make it posslole to have multidimensional solutions 
for object scores with only a single quantification (or optical 
scaling) for categories. As another benefit the use of rank-one 
restrictions makes It possible to relate homogeneity analysis to 
many of the classical multivariate techniques. Mathematically the 
rank -one restriction (for variable j) Is 

<2> * 2j aj\ 
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with zj the kj -vector of single category quantifications , and aj 
the o-vector of weights. Thus the quantification matrix Yj Is 
restricted to be a rank-one matrix. The columns of Yj are all the 
same, apart from weight factors. 

If no further conditions are Imposed on the single quantl cations 
Zj we call them single noalnal. Incorporating prior ordinal 
Information on the categories can be done by requiring that the 
elements of Zj are In the appropriate order. This defines the 
single ordinal treatment of a variable. Single numerical 
restrictions can also be quite useful. We may require that Zj is 
linear with known scores for the categories. All these restrictions 
are discrete, because variables have a restricted number of 
categories. There are consequently many tied ooservatlons, and ties 
1n the data remain ties In the representation. In the continuous 
treatment of variables, as In the primary approach to ties of 
Kruskal (1964), ties can become untied. Because homogeneity 
analysis Is firmly based on the indicator matrix, It does not allow 
untying of tie?, and consequently our approach has no continuous 
treatment of variables. 

The combination of homogeneity analysis with the rank-one 
restrictions defines a form of nonlinear principal component 
analysis. We shall discuss this as one of the various special cases 
below, but first we Introduce theirolementatlon of sets of 
variables Into homogeneity analysis. 
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SETS OF VARIABLES 

In many applications of oultlvarlate analysis the variables are 
grouped In a natural way Into sets of variables. Think of multiple 
regression for Instance, where one has a number of Independent 
variables, or of canonical correlation analysis. One way of dealing 
with sets of variables In homogeneity analysis Is by using 
Interactive coding, familiar from the analysis of variance. 
Variables which belong together are collected as subvarlables of 
one Interactive variable, and the analysis Is applied to the 
Interactive codings instead of to the original variables. 

For a set of r subvarlables the Interactive variable has 
categories corresponding to all cells of the r-dlmenslonal cross 
table. Thus using Interactive coding can rapidly lead to a very 
large number of categories. For 5 subvarlables with 5 categories, 
the Interactive variable has 3125 categories, which Is far too much 
for any data analysis technique. Almost all cells will be empty, 
especially If we cross this gigantic variable with others. 
Nevertheless we may still feel that the subvarlables really belong 
together for the purposes of the analysis we are Interested In, and 
that they form a set of variables In a natural way. We can try to 
avoid the empty cell problem by imposing addltlvlty restrictions on 
the Interactive variables. In analysis of variance terminology this 
means that we require that the category quantifications for the 
Interactive variables consist of main effects only, without 
Interactions between subvarlables. 
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We now translate the above into mathematical notation. The Index 
set J ■ {l t ... t m} for variables Is partitioned Into subsets 
J(l),...,J(k), where k 1s the number of sets of variables. We use t 
for the Index Indicating sets, thus In the sequel alway l... .,lc. 
The homogeneity analysis problem with Jc sets of variables Is now 
defined (De Leeuw, 1973) as 



Subvarlables within sets are treated by (3) as additive. Thus, 
conceptually, sets of variables are dealt with by * **st creating 
Interactive variables, and then by Imposing addltlvlty 
restrictions. Therefore all within set Interactions vanish If 
variables are coded as concatenated Indicators. It Is also possible 
to require that only some within set Interactions vanish by leaving 
some of the Interactive codings Intact. For Instance a set with 4 
variables can be coded as 6 concatenated Indicators corresponding 
with all pairs of variables, or as two concatenated Indicators, tne 
first one corresponding with three subvarlables, and the second one 
«1th the remaining suovarlable. 



In the Introduction we defined OVERALS as the combination of 
homogeneity analysis with optimal scaling and addltlvlty 
restrictions. Mow we are ready for a more formal definition. This 



(3) 



subject to the condition that X'X s nl and u f X * 0. 
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Involves the combination of (2) and (3) Into the problem 

(4) mln a(X,Y) • I t S$Q<X-Ij € j( t )GjYj>' 

subject to the condition that X'X ■ nl and u'X ■ 0, and 
for soma (sub)varlables Yj « zjaj' and Zj e Cj, 

which 1s the definition of the OVERALS problem. In (4) we have used 
the general notation Zj e Cj to Indicate that there may be 
measurement restrictions on the category quantifications 
(numerical, ordinal, and nominal). The measurement level in (4) 1s 
consequently mixed, not only because we can choose between single 
nominal, single ordinal, and single numerical, but also because we 
have multiple nominal as an option as well. We still consider (4) 
as a form of homogeneity analysis, with restrictions, and we have 
implemented a technique for solving the OVERALS problem In the 
OVERALS computer program. In the following section we discuss the 
algorithm used In this program* 

THE OVERALS ALGORITHM 

In this section we explain how the OVERALS problem is solved by 
using an alternating least squares (ALS) algorithm. First we solve 
t multiple OVERALS problem, which Is the OVERALS problem with all 
measurement levels multiple nominal. Then we solve the general 
OVERALS problem (with variables having multiple and/or single 
measurement levels, from new on briefly called «iltiple and single 
variables) by Imposing rank -one restrictions on the multiple 
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Quantifications corresponding with single variables. 

First we Introduce some notation which Is more convenient than 
the summation notation within sets used In (3) and (4). We write 
all Gj corresponding with variales In set t next to each other in 
the matrix G^, and the Yj for set t above each other In Y^. Thus 
Is the sum of all GjYj In set t. 

The stationary equations for the OVERALS Problem (4) are the 
following. The optimal object scores X, for given Y Jt nust satisfy 
the equation 

(5) X* = Ml t G t Y t , 

with * a symmetric matrix of Lagrange multipliers, and M = [I-n^uu'] 
the operator which transforms a vector Into deviations from the 
mean. Equation (5) Is obtained by differentiating the loss function 
with respect to X under tne restrictions that u'X = 0 and X'X « nl. 

If we write Z for the right-hand side of (5), then premultlplylng 

2 

both sides by their transposes gives n* - I'Z. Thus * * 
(Z'Z/n) 1/2 , and X * n 1/2 Z(Z7)" 1/2 . Comwtlnq the optima X Is 
actually a form of the Orthogonal Procrustes problem, for which the 
solution Is classical (Cliff, 1966). The right hand sloe of (5) Is 
the average of the multiple transformed sets of variables, where 
each transformed set 1s the sum of a number of transformed 
variables. The ootlma! matrix of object scores Is an orthogonal 1 zed 
version of this average. 

The optimal category Quantification of variable j of set t Is 
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(6) Yj = Dj^Gj'tt-Vy), with 

v tj "Mt-Sj Y j °j -Sj'Sj- 

In order to show that (6) does Indeed give the optimal nultiple 
quantifications we write 

(7) SSQU-G^) = SSQ((X-V tj )-GjYj) = 
SSQ((X-V tj KjYj) + tr CVj-VjI-DjCYj-TjI. 

Clearly the minimum over Yj Is obtained by setting Yj equal to Yj. 
The matrix Dj Is diagonal, and contains the frequencies of the 
different categories of variable j. The operator Dj'^Gj 1 averages 
over objects belonging to the same category, i.e. computes category 
means. We average the object scores X minus a correction term V t j 
for the other variables In set t. Note that In the 'one variable In 
each set* case, the correction term Is zero. In that case the 
optimal category quantification Is the average or centroid of the 
object scores of all objects in the category. 

The two equations (5) and (6) Illustrate the centroid principle 
which is one of the leading principles in correpondence analysis. 
Category quantifications are centroids of objects scores (with a 
correction for other variables, if necessary), and object scores 
are averages of quantified variables (with an orthogonal ization, if 
necessary). The multiple OVERALS problem is solved ^ an ALS- 
procedure which alternates step (5), combined with the Procrustes 
orthogonal izatl on, and step (6). The centroid principle in the 
stationary equations (5) and (6) is Implemented by a reciprocal 
averaging algorithm. 
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The general OVERALS problem Is the nultlple OVERALS algorithm 
with an extra Inner Iteration step for single variables (I.e. 
variables with rank-one restrictions) added. The Inner Iteration 
step consists of estimation of weights and single category 
quantifications, again It alternates two steps of an Inner *LS- 
procedure. We could continue the Inner Iterations until convergence 
before proceeding with outer Iterations again, but computational 
experience has Indicated that performing only one Inner iteration 
1s generally more efficient (cf. Takane, Young, and De Leeuw, 
1980). 

The multiple category quantifications (6) are computed for all 
v ables, both mjltlple and single. Weights and single category 
quantifications are solved for each single variable separately. In 
order to show how this must be done optimally, we use the 
partitioning of the sum of squares In (7), assuming now that Yj is 
the currently optimal multiple quantification, and Zj the current 
single quantification. Thus 



(8) 



SSQU-G^) * SSQ((X-V tj )-GjYj) « 
SSQUX-V^-GjYj) *tr (Yj.a j 2 j ')'D j (Y r a j ZjM 



Define 



(9) 



The last term of (8) can now be written as 



(10) 
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which shows that ij Is optimal. In the same way we can define 

(11) zj * (aj'ajrN'j' 
and write 

(12) tr (Yj-ajZj'J'DjtYj-ajZj 1 ) » 

tr ( Y j -a j£ j 1 ) 'D j { Y j -a ji j 1 ) + «j '•j B Dj Clj-ajl. 

Now Yj and aj are the current values of the multiple category 
Quantifications and the weights, respectively. We see from (12) 
that (11) Is optimal for single nominal variables. For single 
ordinal variables the transformations are obtained by using 
monotone regression (MR), with weights Dj, on the single nominal 
solution. Compare also Young (1981). The regression Is based on the 
orlqlnal ordering of the categories In the data matrix. Thus for 

single ordinal the optimum Is 
(13a) z * ttHYV^Yj 1, 

and for single numerical transformations we use linear regression 
(LR) Instead. Thus 

(13b) z * LRHajV^Yja^. 

Summarizing the OVERALS algorithm we have: an alternating least 
squares procedure estimating the objects scores plus 
orthogonal 1zat1 on (equation 5), and for each variable the multiple 
category quantifications (equation 6). If there are single 
variables the single category quantifications and the weights are 



: EMC 



Overals 
15 



also estimated In a separate ALS-procedure of which one step is 
carried out In each major Iteration. Then (6) Is followed by (9), 
(11) and (13). 

RELATIONSHIP BETWEEN OVERALS AND EIGENVALUE PROBLEMS 

In this section we discuss the OVERALS loss function for the 
multiple case, and the general mixed case a bit more in detail. We 
do this to relate the technique to various more familar concepts 
from linear multivariate analysis. More specifically we want to 
Investigate 1f and In how far OVERALS solves eigenvector-eigenvalue 
problems. 

Let us start with multiple OVERALS. Remember that Gj was the 
Indicator matrix of variable j, and G^ was the supermatrlx 
containing all Gj In set t, obtained by writing the Gj next to each 
other. It follows directly from (4) that the minimum of the loss 
over the for fixed X, is attained at Yj. « [G t ] + X. with [.]* 
denoting the Moore-Penrose Inverse. Substituting In (4) gives 

(14) a(X,*) - l t tr X'{I-P t |X. 

with P$ s ^[^tT* the orthogonal projector on the subspace spanned 
by the columns of G^. Minimization of (14) over X, subject to the 
normalization conditions specified In (4), gives the stationary 
equation 

(15) l t {MP t M}X - KX*. 
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again with * a symmetric matrix of Lagrange miltlpliers. This shows 
that the ootlmal X Is a Dasls for the elgenspace spanned Dy the p 
principal eigenvectors of the matrix MP*M, with P* the average of 
the projectors P^. The minimum loss Is given by 

(16) ol*.*> » nkp{l - d" 1 X s X s }. 

with x $ the p largest eigenvalues of MP*M (and also of *). This 
shows that solving the nultlple OVERALS problem corresponds to 
solving the eigenvalue problem for MP*M, and that the minimum loss 
1s a linear function of the average of the p largest eigenvalues. 
In fact 1t suffices to consider the eigenvalue problem for p*, as 
MP*M Is the deflated P* matrix with the first trivial eigenvector, 
which has all elements the same, removed. The eigenvalue problem 
could also be solved directly, by using a Jacobl or Householder- 
Glvens algorithm, but this 1s quite 1moract1cal In many situations, 
because the number of objects can be very large indeed. 

It is of considerable interest to observe that Instead of 
soiving the eigenvalue problem for 1n order to find the optimal 
X, we can also solve the generalized eigenvalue problem for the 
pair (£,k0_) in order to find the optimal Y. Here £ 1s the Burt- 
matrix of the problem, defined by £ s £'G, with G_hav1nq all Gj. 
next to each other (or, wnat amounts o the same thing, all Gj next 
to each other). Matrix £ contains the blvarlate cross tables of all 
pairs of variables. Compare G1f1 (1981, p 62), or Greenacre (1984, 
p 140). Matrix (Ms block-diaoonal, It is the direct sum of the 
G t , G t . Thus the optimal Y so sfles 
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(17) CY = kDY*. 

The proof 1s short. Because P*X = k"W" l G_'X = X* and DfVx * Y we 
have GY = kXff. Premultlplylng both sides with DfV gives Of l CY = 
kX*, which 1s (17). Using (17) nay be, at least In some situations, 
an attractive way to compute the optimal solutions of the 
homogeneity analysis problem with sets of variables. In other 
cases, however, this generalized eigenvalue problem may be simply 
too large. Above that the whole development only applies If all 
variables are treated as multiple. 

For OVERALS with single quantifications only we follow a similar 
procedure to st.u<fy the optimal solutions. We Introduce some new 
notation to do this efficiently. Define, for each variable, the 
vector qj * GjZj. The qj are normalized Induced scores for objects, 
or transformed variables. They are organized as columns of matrices 

one for each set. In a similar way the weight vectors aj are 
organized as rows of matrices A±* We may rewrite the OVERALS 
problem (4), supposing that all variables are single, as 

(18) mln o(X,a.A) - E t SSQ(X-<l t A t ), 

subject to the condition that X'X - nl and u'X ■ 0, 

2 j c C j' 

Now problem (18) 1s very closely related to our previous OVERALS 
problem (4). We merely have to replace^ in our previous formulas 
by <k and by A t . But this means that (15) also applies with 8 
flt[fltl + ' Als0 <*(*»*) is def1ned a * 1" from the eigenvalues of 
the average projector P*. if we write all next to each other In 
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then we can also compute c(*,*) as In (16) from the generalized 
eigenvalues of £ ■ (J/£w1th respect to k times the direct sum of 
the fit£i' There Is one considerable difference between (18) and 
its predecessors, however. The vectors qj are functions of the Zj, 
which means that the averaqe projector P* and the Burt matrix £ are 
a function of the single category quantifications as well. Thus we 
can write 

(19) c(*,Q.*) = nkp (l-o" l X $ X s (a)} . 

Result (16) shows that multiple OVERALS amounts to computing 
eigenvalues of a given matrix, result (19) shows that single 
OVERALS means choosing single quantifications of the variables In 
such a way that the sum of the p largest eigenvalues Is maximized. 
Of course (Ms constant If all variables happen to be single 
numerical . 

We can now combine our results so far to obtain the 
interpretation of the minimum loss for the mixed case, in which 
some variables are single and some are multiple. But we shall 
Introduce a somewhat different terminology, which makes the 
comparison more interesting. We use the notion that a multiple 
variable can be considered as a number of copies of a single 
variable. Or, somewhat differently, a nultlple variable Is really a 
set of single variables. This idea Is due to Oe Leeuw (1983, 1984). 

Suppose Yj Is a given multiple quantification. We can decompose 
Yj, a matrix with kj rows and p columns, in many different ways in 
the form Yj * ZjAj. One solution simply takes the columns of Zj as 
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the normalized version of the columns of Yj, and takes Aj equal to 
the diagonal matrix of standard deviations of these columns. But Zj 
could also be an orthogonal i zed version of Yj, with Aj symmetric or 
upper-triangular, and so on. In any case the decomposition can be 
written as 

(20a) Yj - ^z jr a. r \ 
and thus 

(20b) GjYj * ^QjrV. 

Here index r is used for the columns of Zj and the rows of Aj in 
the decomposition of Yj. If there are Pj such rows, then (20a) and 
(20b) show that having a multiple variable Is equivalent to 
having pj single variables with the %$m Indicator Matrix Gj, I.e. 
Pj copies. Note that in general we can take Pj < min(p,kj). 

By using the idea of copies we reduce the mixed problem, with 
both single and multiple variables, to the single OVERALS problem, 
and we can use the Interpretation of this problem in terms of 
eigenvalues of the Burt-tables and average projectors defined by 
means of the Q*. given above. An additional benefit of use of copies 
1s that it becomes easy to define multiple ordinal and multiple 
numerical variables. We can fix the measurement level of each of 
the factors in the decomposition separately. Thus we can, for 
Instance, use one variable three times in its set, once ordinal and 
two times nominal. If all copies of a variable are ordinal, then it 
is multiple ordinal. This opens many new possibilities, but we 
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merely outline them here, because the use of copies 1s not yet 
Implemented In the program OVERALS. If one wants to use the notion 
of copies In the program, one actually has to create the copies In 
the data set. 

He have shown In this section that OVERALS can Interpreted In 
terms of eigenvalue problems. In the mixed nultlple and single 
numerical case these eigenvalue problems could be defined 
completely in terms of the data. OVERALS then becomes the 
simultaneous Iteration method for computing a few of the dominant 
eigenvalues of a matrix, and 1t consequently converges to the 
global minimum of Its loss function (Rutlshauser, 1969). In the 
other cases the eigenvalue Problem varied with the single 
Quantifications, and we had to choose the quantifications In such a 
way that the dominant eigenvalues were maximized. This Is a 
nonlinear problem, which may have many local minima. We do not know 
how serious tne local minimum problem is. All nonlinear 
multivariate analysis problems, except the eigenvalue problems, 
have to take the existence of local minima Into account. The little 
research that has been done, by Segljn (1985) and Kuhfeld (1985) in 
the PRINCALS/PRINQUAL framework, shows that local minima do not 
aopear to be a serious oroblem. But It is not known how general 
this finding is. 



THE COMPUTER PROGRAM OVERALS 



The OVERALS algorithm as described above has been implemented In 
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a coffouter program which Is also called OVERALS (Verdegaal, 1986). 
It has been developed at the Department of Data Theory by the 
authors of the article, and It has been written In FORTRAN. 

In the OVERALS program three Initializations are performed. The 
object scores X are Initialized by using random values (the user 
determines p). For single variables the quantifications are set 
equal to the standardized versions of the original data. The 
multiple cateqory quantifications are Initialized as zero. The 
program starts by computing a solution which has all njltlple 
variables multiple nominal and all single variables single 
numerical. After convergence of these Initial Iterations the 
measurement levels of the single variables are adjusted to the 
types specified by the user, and the Iterations are restarted. This 
strategy seems to prevent the occurrence of local minima rather 
effectively f at least in the case In which the measurement level of 
the variables Is single ordinal. A random Initialization for the 
category quantifications Is also Possible. In case of single 
nominal variables we advise the use of one or several random 
starts. 

In the program the Iteration process is stopped when the loss 
difference between consecutive main steps is small enough. The user 
may define 'small enough 1 . 

Another feature of the OVERALS program is the final rotation. 
After convergence the object scores X and the category 
quantifications Yj are rotated In such a way that the X are the 
eigenvectors of the matrix MP*M» and not merely a rotation of these 
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eigenvectors. The eigenvalues of this matrix, which are called the 
generalized canonical correlations by De Leeuw (1984), are a 
Measure of the goodness -of -fit of OVERALS. To find some Indication 
for the significance of these statistics. De Leeuw and Van der Burg 
(1985) have studied their permutation distribution. They found that 
the significance testing methods they developed seemed to work 
rather well, but their study has a somewhat limited scope. 

GEOMETRY OF OVERALS 

In the preceding sections we have discussed object scores and 
multiple and single category quantifications. How do we Interpret 
the valuv of these parameters geometrically? Let us make pictures 
1n P'^imensional space (1n practice, of course, we can only plot 
two- or three-dimensional projections of these pictures). The 
object scores X define a cloud of n points in this space, with unit 
variance 1n all directions. The projections on the different 
dimensions are uncorrected. 

We can compute the centroids of the oojects which correspond to 
the same category of each variable (cf. Figure 7). We call these 
values the category centroids, in formula rows of Dj" 1 Gj 'X. In 
general these cent. <us are different from the multiple category 
quantifications given 1n (6), except if there 1s only one variable 
!n the set. If we put category centroids and multiple category 
quantifications together 1n one plot, we can a see a the Influence of 
the other variables 1n the set. 
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The single category quantifications Zj, together with the 
weights aj, can be used to construct the rank-one quantifications. 
By plotting the nultlole category Quantifications and the rank-one 
quantlflcat. ^ja j 1 In a single plot, we see the effect of the 
rank-one restrictions. The rank-one quantifications are on a line 
through the origin, with direction cosines proportional to aj« The 
transformed variables qj • GjZj can be correlated with the object 
scores X to produce the component loadings Cj . The name Is chosen 
In analoqy with principal component analysis. They can be depicted 
as vectors representing transformed variables in the space of the 
object scores {cf. Figure 2). We can also plot, In the same space, 
the average rank-one quantifications ZjCj', which are the 
projections of each category into the space of object scores (cf. 
Figure 4). These are different from the because the Cj are 

the correlations of qj with X, while the aj are the correlations of 
qj with X - V t j. Thus again the difference Is the contribution of 
the other variables. 

In two-sets canonical correlation analysis It Is more usual to 
show plots of the canonical variables for both sets, which are the 
G^, than of the object scores. If there are only two sets, G^Y^ 
and are orthogonal, and related by a dlaqonal transformation. 
If the number of sets Is larger the canonical variables are no 
longer orthogonal, and they may differ more fundamentally. 
Therefore we prefer object score plots, but one can, of course, 
plot canonical variables for each of the sets If this seems 
desirable. 
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RELATIONSHIP WITH OTHER MULTIVARIATE TECHNIQUES 

It Is Interesting to consider the relationship Detween the 
OVERALS technique and other linear and nonlinear multivariate 
techniques. We can be brief about the relationship with homogeneity 
analysis. If each set contains only one variable, and all variables 
are multlole nominal, then OVERALS Is Identical to homogeneity 
analysis. This special case has been Implemented 1n the program 
HOMALS (Van de Geer, 1985). If there are only two variables, and 
both these variables are nult1ple nominal, then OVERALS 1s 
equivalent to correspondence analysis. 

If each set contains oply one variable, but the measurement 
levels are mixed, then OVERALS defines a form of nonlinear 
principal component analysis. This technique has been Implemented 
1n a separate program PRINCALS (GifK 1985). The related PRINCIPALS 
program of Young, Talcane, and Oe Leeuw '1978) does not have 
mHitlple options, but can handle continuous variables. PRINCIPALS 
is now Implemented In PRINQUAL (Kuhfeld, Sari \ and Young, 1985). 
If all variables are single numerical, and each :et contains only 
one variable, OVERALS becomes ordinary principal coroonent 
analysis. 

If there are two sets of variables we move into tne realm of 
canonical correlation analysis. In fact If all variables are 
considered single numerical OVERALS becomes equivalent to ordinary 
canonical correlation analysis. If only one interactive variable Is 
reduced to a set of variables by using add1t1v1ty restrictions, 
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while the other Interactive variable 1s left Intact (coding 
treatment effects), we can use OVERALS to perforin multivariate 
analysis of variance. If one set of single variables Is combined 
with a set containing one multiple nominal variable (coding a 
partition of the objects), we can perform canonical discriminant 
analysis. An OVERALS of two sets of single variables Is very close, 
but not exactly Identical, to the nonlinear canonical correlation 
technique CANALS Proposed by Van der Burg and De Leeuw (1983), and 
Van der Burg (1983). CANALS Is an Improvement of MORALS /CORALS 
proposed by Younq, De Leeuw, and Takane (1976). 

Canonical analysis techniques with k sets of variables were 
proposed In the single numerical case by many authors. Two early 
contributors are Horst (1961) and Carroll (1968). Kettenrlng 
(1971), G1 f 1 (1981, chapter 6), and Van de Geer (1986, part IV) 
provide reviews. It Is possible to think of OVERALS, with all 
variables single, as a nonlinear generalization of one of these 
generalized forms of canonical correlation analysis. In fact It Is 
a k-set canonical correlation analysis with optimal scaling. The 
difficulty with this Interpretation (from the didactical point of 
view) 1s the step from single OVERALS to OVERALS with both single 
and Multiple quantifications. This step Is not very natural, and we 
need the notion of codes to brldqe the gap between multiple and 
single (cf. section on the relationship of OVERALS with eigenvalue 
problems). Therefore we have chosen the alternative route of 
starting with homogeneity analysis, and Introducing OVERALS by 
discussing the use of addltlvlty and rank-one restrictions. For the 
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other route, via generalized canonical correlation analysis, we 
refer to Van der Burs, De Leeuw, and Verdegaal (1984). 

APPLICATION OF OVERALS 

The data of this study are based on field surveys on chronic 
lung disease, carried out at three year Intervals between 1972 and 
1982 1n the Netherlands (Van der Lende et al. f 1981; Van Pelt et 
al., 1985). The locations were a rural area, Vlagtwedde, and an 
Industrial town, Vlaardlngen, the latter having a much higher grade 
of air pollution. The residents of both towns have been questioned, 
amongst other things, about their smoking behaviour, their 
respiratory symptoms and their personal background. The smoking 
behaviour has been operational ized by four variables: SMO, RATE, 
PER, and TIME; respiratory symptoms by five variables: COU, PHLE, 
DYS, VJHE, and AST. As background variables we used SEX and AGE. The 
residence is denoted by RES. The variables and the meaning of the 
categories are given in Table 1. 

INSERT TABLE 1 ABOUT HERE 

There are 2870 Individuals iam>led from a data base of 3959 
Individuals under 56 years of age. Starting from the distribution 
of AGE for the total data base, we sampled four groups (denoted MR 
- men from rural Vlagtwedde, MI ■ men from industrial Vlaardlngen, 
and WR, WI for the women) with Identical AGE-distr1but1ons, so that 
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there exists no correlation between AGE and SEX x RES. This was 
done to avoid trivial relationships, mainly between AGE and RES (on 
the average people In rural areas are older). 

The goal of the OVERALS analysis was to find a common space In 
the four sets determined by the respiratory symptoms, smoking 
behaviour, personal background, and residency. 

We did four analyses, starting with 2870 Individuals and all 
variables single nominal, except AGE which was taken as single 
ordinal. The same analysis was repeated for men and women 
separately. Finally another analysis on all 2870 Individuals was 
performed, but now the variables AGE and SEX were combined to one 
Interactive variable AGE x SEX. taken as multiple nominal, and the 
other variables were taken as single nominal. We considered only 
two-dimensional solutions. We discuss the results of the analyses 
with the help of Dlots. We show transformations of several 
variables (Figure 1), component loadings (Figures 2, 5, and 6), 
object scores (Figures 3 and 7), and average rank -one 
Quantifications (Figure 4). In addition we have two tables which 
give correlations (Table 2) and eigenvalues (Table 3). We do not 
show the weights as they are difficult to Interpret due to the fact 
that they 'Incorporate' the correlations with the other variables 
In the set (cf. Geometry of OVERALS, or Thorndlke, 1977). 



INSERT FIGURE 1 ABOUT HERE 
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INSERT TABLE 2 ABOUT HERE 

An overall Impression of the first analysis (men A women I) is 
obtained from the component loadinqs (Figure 2). However, before we 
are able to Intreprete this figure we have to stuty the 
transformations of the variables. We find that the single nominal 
restriction for most variables results in almost ordinal 
transformations. The exceptions are the smoking behaviour variables 
RATE, PER and TIME. Transformation plots of all smoking variables 
and of AGE are given in Figure 1. The violations of ordinal ity 
occur mainly In the first categories of RATE, PER and TINE, which 
correspond to people who have never smoked. Oue to the nonlinear 
transformations of the variables we expect differences between the 
correlations before and after transformation (respectively upper 
and lower triangle of Table 2). However the overall structure of 
the correlation matrix does not Change a great deal, except for the 
submatrix of smoking habits. They form a tight cluster before 
transformation (mainly related to sex). After transformation they 
split up Into age-related smoking habits (PER and TIME) and sex- 
related smoking habits (SMO and RATE). This is mainly because of 
the quantification for the non-smokers category. 

INSERT FIGURE 2 ABOUT HERE 

The component loadings, which are the correlations between 
object scores and transformed variables, are plotted as vectors in 
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Figure 2. They point towards a Mph quantlf 1r*t1on. As we have 
seen, this means that they point to Individuals having high 
category numbers for all variables. We only have to keep 1n mind 
that the categories for non-smokers are quantified around zero, and 
that ex-smokers and current smokers have the same quantification In 
this solution. The component loadings are Interpreted 1n the usual 
trty. Thus a high age corresponds to a long period of smoking and to 
severe tyspnoea. The respiratory symptoms, except DYS, are ouch 
more related to SEX than to AGE. As the vectors for symptoms and 
SEX point Into opposite directions their relationship 1s negative 
Tnus 1n this sample men more often have synetoms than women. The 
SEX-vector and the SMO-vector are opposite too, thus also men 1n 
this sample are more often ex-smokers than women. 

INSERT FIGURE 3 ABOUT HERE 

In addition to plotting variables we plotted Individuals by 
their object scores (Figure 3). Together with tne object scores we 
present the 90-percent1le contours (equlprobablHty ellipses) of 
eacn of the four SEX x RES groups MR, HI, MR, and HI. Tne figure 
shows that men differ from women. Also that the difference between 
Ylagtwedde and Vlaardingen 1s larger for women than for men. To 
obtain more Insight 1n the plot of object scores with respect to 
the other variables we projected single cateqory quantifications of 
all variables onto the space of object scores (Figure 4). Above we 
referred to these projections as average rank -one quantifications. 
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The categories of the variables lie on lines with the same 
direction as the vectors of Figure 2. To keep Figures 3 and 4 
legible, they have been plotted with different scales. In Figure 4 
the cateqorles are Indicated by the first (or first two) letters of 
their variable name and their category number (RE « RES, S « SMO, R 
» RATE, P = PER, T » TIME, A « AGE, SE « SEX, C * COU, PH = PHLE, 0 
■ OYS, W « WHE, AS c AST). Only the categories In the middle are 
left out of the plot. Thus categories which are missing In the plot 
have quantifications near zero. 

INSERT FIGURE 4 ABOUT HERE 

Figure 4 shows how the categories are quantified, and tells how 
to Interpret tne object scores. For Instance at the left, aoove the 
middle, we see categories for older people (AGE -categories A9 and 
A10) who most likely smoked already a long time (PER-categorles P8 
to P13), or who stopped smoking lonq aqo (T3 and T4, category T2 
does not occur), and probably with a severe dispnoea (D3). This 
means that we find object scores for people characterized 1n this 
way at the left side of Figure 3. In the slightly oblique vertical 
direction Flgi're 4 shows no variation In AGE but much variation in 
the respiratory symptoms COU, PHLE and WHE, In the smoking 
variables RATE and SMO, In SEX and in RES. In the lower part of 
Figure 4 we find categories for people with respiratory symptoms 
(C2, PH2, W2, W3), most probably men (SE1) living U Vlaardingen 
(RE2) who smoke(d) a lot (S2, S3, R7, R8, R9). In the upper part we 
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find categories for females (SE2) and for never smokers (SI) or 
very light smokers (R2 f R3, R4). Most likely they have no 
respiratory symptoms (Wl, and CI, PHI In the center). Thus 1n the 
plot of the object scores we find healthier people, apart from 
heaving tyspnoea, more at the top. They are more often women than 
men, do not smoke or lightly so, live more 1n Vlagtwedde than 
Vlaardingen, and are found In all AGE categories. 

Differences between men and women with respect to smoking habits 
and respiratory symptoms are a dominant feature In this solution. 
We therefore reanalyzed the data separately for men and women. We 
present the Plots of component loadings in Figures 5 and 6. Note 
that the two plots are on the same scale. In both cases the 
respiratory synptoms (except DYS) are independent from AGE, and 
strongly related to RATE. Compared to Figure 2, the variable DYS 
has moved away from from AGE, apparently because we have controlled 
for SEX. In fact shortage of breath (DYS) occurs equally often in 
women as in men and correlates mainly with age. It also correlates 
with the other symptoms, but in the two-dimensional solution of 
males and females together there was no 'place 1 to show that. 

INSERT FIGURES 5 AND 6 ABOUT HERE 

Figures 5 and 6 show that the smoking period, PER, correlates 
more with AGE for men than women. Also we see that SMO has a 
different direction and lenqth for the two solutions. This 1s a 
reflection of the fact that between 1972 and 1982 most older women 
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do not smoke, whereas the neversmokers in males are usually the 
younger ones. 

Another difference between the solutions for men and wo;nen is in 
the role of residence. Tor men this variable 1s totally 
unexplained, for women it is very pronounced In the solution. The 
respiratory symptoms correlate with the rate of smoking for both 
men and women, but they only correlate with residence for women 
(Figures 5 and 6). This Indicates that fewer women in Vlagtwedde 
smoke than In Vlaardlngen, or they smo^e less. It seems therefore 
that the difference in smoking behaviour between males and females, 
and between the two residences among females, Is a more Important 
predictor than place of living as such. 

Up till now we found a strong effect of AGE (independent from 
symptoms, except DYS) both in the total analysis and in the 
separate analyses for men and women. We also found a large 
difference between males and females. Therefore we reanalyzed the 
data, but in this case with the Interactive variable AGE x SEX 
taken as ntiltlole nominal (men A women II). The results confirm the 
conclusions of the first analysis. We show the categories of AGE x 

SEX (Ml M10,W1 W10) in the space of object scores (Figure 

7). Each category point Is in the centrold of (the object scores 
of) all individuals scored in that particular category. The 
quantifications form a letter V bend leftwards. In fact north-west 
Is still the direction of increasing age, and north-east still the 
direction of SEX -difference. Categories Ml and Wl overlap, W2 and 
W3 have changed order, as have W9 and W10, and M9 and M10. But the 
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Interchanges are, on the whole, minor. The category quantifications 
of the other variables are very similar to those of Figure 4, we do 
not show them. Although there is an interaction effect between SEX 
and AGE (the younger females and males dlfferless from each other 
than the older ones do) we can easily describe the effect by two 
separate variables as the results of the two analyses do not differ 
substantially. 



INSERT FIGURE 7 ABOUT HERE 



Summarizing the four analyses we can say that we found a 
relationship between smoking behaviour and respiratory symptoms for 
both males and females. Only for women we also found an effect of 
residence with respect to respiratory symptoms. This effect can be 
reduced to a difference In smoking habits between women from 
Vlaardinqen and Vlagtwedde. Sex is correlated with both symptoms 
and smoking behaviour. Age is mostly related to smoking variables 
with a time effect, such as TIME and PER. The symptoms are not 
related to age (In the age range we have considered), except 
shortage of breath. We found an Interaction effect between AGE and 
SEX. YounQer people differ less In synptoms and smoking habits than 
older people do. The nonlinear transformation of the variables 
(first analysis) has effected mostly the smoking habit variables. 
Mainly due to the quantification for the category non-smokers the 
cluster of smoking habits falls apart after transformation. For 
completeness we finish this application with an overview of the 
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generalized canonical correlations (Table 3). Perfect homogeneity 
corresponds with a correlation of 1, and no relation at all with a 
canonical correlation of 1/k. From TaDle 3 It can be seen that for 
Men the first dimension Is much more Important than the second one. 
For the other analyses the two dimensions are more of equal 
Importance. 

We emphasize that this example Is only a tiny demonstration of 
the capabilities of OVERALS. There are so many choices and options 
in the program, that we can never cover tne complete range of 
possibilities. We refer to Glfl (1981) for other examples. Many 
applications of special cases of OVERALS can be found throughout 
that book. 



DISCUSSION AND EXTENSIONS 



The OVERALS algorithm opens many possibilities in data analysis. 
It covers most of the usual linear and nonlinear multivariate 
analysis techniques. But this generality comes at a price. In the 
first place there is the possibility of local minima in some of the 
more complicated special cases. It Is necessary to study the 
seriousness of this problem In more detail in th* future. In the 
second place we do not have Information on the stability of the 
results. For several special cases of OVERALS (two variables, or k 
sets each with one variable) research has been done, however for 
the more general cases of OVERALS not very much Is known. De Leeuw 
and Van der Burq (1985) make a start by means of randomization 
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methods. They compare several methods and obtain promising results. 
They Investigate the stability of generalized canonical correlation 
In a small study* More work In this direction has been planned. Van 
der Burg and Oe Leeuw (1985) have Investigated ways of computing 
confidence realons for the OVERALS results. For this they use the 
Delta method combined with the Jackknlfe. Their results are 
encouraging, but still very preliminary. 

Another apparent disadvantage of the OVERALS method is the fact 
that It can only handle co«olete data matrices. We did not discuss 
missing values in this article. The compute) program OVERALS does 
handle missing data, however, on the basis o\ equations given by 
Glf 1 (1981, Chap. 6). Verdegaal (1985, 1986) gives an extensive 
discussion of the OVERALS program with missing data. 

The nonlinear transformations In OVERALS are a real extension of 
the usual linear transformations In multivariate analysis. However 
the transformations we use are necessarily step functions. This can 
be a disadvantage In some cases. To make transformations more 
smooth we can, for Instance, use splines. Oe Leeuw, Van 
Rijckevorsel, and Van der Wouden (1981) have Implemented splines In 
the principal component algorithm. We plan to Integrate these 
transformations into OVERALS as well. 

With these extensions OVERALS can effectively be applied in even 
more data analysis situations. 
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TABLE 1 



Variables from the study of chronic lung disease. 



set 1 RES: Residence, (1) vlagtwedde, (2) vlaardingen. 

set 2 SMO: Smoking, (1) never smoker, (2) ex-smoker, (3) current 

smoker. 

RATE: Rate of smoking (amount of tobacco), (1) never smoker, 
(2) low rate, (9) high rate. 

PER: Smoking period, (1) never smoker, (2) short period, ... 
(13) long period. 

TIME: Time since last cigarette, (1) never smoker, (2) long 
ago, (5) recently, (6) current smoker. 

set 3 AGE: Age discretlcized into periods of 3.5 years, (1) age 

19 - 22.5, .... , (10) age 52.5 - 56. 
SEX: Se?., (1) male, (2) female. 

set 4: COU: Coughing, (1) mo, (2) persistent. 
PHLE: Phlegm, (1) no, (2) persistent. 
DYS: Qyspnoea or snortage of breath, (1) no, (2) slight/ 

moderate, (3) severe. 
WHE: Wheezing, (1) ne*er, (2) ever, (3) severe. 
AST: Asthma, (1) ever, (2) never. 



ERLC b 



TABLE 2 



Correlations before and after transformation, respectively upper and lower 
triangle, men and women I 



RES 




.00 


.04 


.03 


.02 


.00 


-.06 


.09 


.11 


.05 


.04 


.04 


SMO 


.04 




.75 


.71 


.97 


-.07 


-.32 


.18 


.12 


.02 


.17 


-.02 


RATE 


.02 


.03 




.64 


.74 


-.03 


-.39 


.25 


.18 


.10 


.20 


-.01 


PER 


.02 


.01 


.26 




.73 


.41 


-.43 


.19 


.15 


.14 


.18 


.01 


TIME 


-.07 


.03 


.38 


.17 




-.08 


-.34 


.16 


.11 


.02 


.16 


-.01 


AGE 


.00 


-.06 


.01 


.67 


-.15 




.00 


.06 


.07 


.22 


.08 


.04 


SEX 


-.06 


-.35 


-.23 


-.26 


.03 


.00 




-.10 


-.09 


.06 


-.06 


.01 


COU 


.09 


.15 


.20 


.12 


.05 


.06 


-.10 




.53 


.24 


.31 


.17 


PHLE 


.11 


.10 


.16 


.U 


.04 


.07 


-.09 


.53 




.25 


.31 


.13 


OYS 


.05 


.02 


.12 


.19 


.01 


.23 


.06 


.23 


.24 




.33 


.20 


WHE 


.04 


.15 


.13 


.09 


.04 


.07 


-.06 


.28 


.27 


.29 




.31 


AST 


.04 


.00 


-.02 


.02 


-.04 


.04 


.01 


.17 


.13 


.19 


.32 
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SMO 


RATE 


PER 


TIME 


AGE 


SEX 


COU 


PHLE 


DYS 


WHE 


AST 
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TABLE 3 



Generalized Canonical Correlations. 







1 


2 


nen & women 


I 


.469 


.390 


men 




.510 


.317 


women 




.426 


.362 


men & women 


II 


.486 


.398 



Figure 1. 



Transformations of smoking behaviour variables and AGE, 
men & women I. 



Figure 2. Component loadings, men & women I. 

Figure 3. Object scores and 90-percent contours for SEX x RES, men 
& women I. (M = men, W = women, R = Vlagtwedde, I = 
Vlaardlngen). 

Figure 4. Average rank one quantifications, men & women I. (RE = 
RES, S « SMO, R * RATE, P = PER, T = TIME, A = AGE, SE « 
SEX, C = COU, PH = PHLE, D * DYS, W » WHE, AS = AST, 
1 10 « category numbers). 

Figure 5. Component loadings, men. 

Figure 6. Component loadings, women. 

Figure 7. Object scores and category centrolds for AGE x SEX, men 
& women II. (M ■ men, W = women, 1,...,10 * age 
categories). 
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51 





FIGURE 4 
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